Experiments in Data Compression V (or "It seemed like a good idea at the time") by John Kormylo E.L.F. Software State of the Art Since Report IV, I have fixed a couple of bugs in the "Smart" algorithm. The current comparisons are as follows: | DEMONSTR.DOC | ELFBACK.PRG | WORDLIST.TXT | -----------+--------------+-------------+--------------+ ZIP | 25134 | 32417 | 496876 | Best 16K | 21960 | 32372 | 446606 | Smart 16K | 22076 | 32359 | 489471 | Smart 8K | 22214 | 32190 | 483486 | -----------+--------------+-------------+--------------+ Adapt 8K | 22317 | 32057 | 484332 | -----------+--------------+-------------+--------------+ The "16K" and "8K" refer to the number of entries in the main dictionary. Note that the "Smart" method now out-performs the "Best" method on one of the files. Unfortunately, the results were not so good when archiving ELFARC v1.4, as shown by the following table: | ELFARC.PRG | ELFARC.RSC | READ_ME | -----------+------------+------------+-----------+ ZIP | 32988 | 6255 | 1000 | Best 16K | 33143 | 6314 | 1037 | Smart 8K | 33107 | 6305 | 1028 | -----------+------------+------------+-----------+ Adapt 8K | 32978 | 6192 | 1043 | -----------+------------+------------+-----------+ Emulating the Competition A number of tests were made to make LZA more like ZIP. Most changes (such as increasing the maximum match length to 256 bytes or adding every possible string to the dictionary) resulted in a loss of performance. One feature of LZA which has never really been tested for effectiveness is the "new character" code. Instead of assigning and initial count of 1 to each of the 320 possible codes (256 ASCII and 64 length), LZA uses a "new character" code which is followed by the character to be added. The reason for this is to improve the performance on text files, where only about 96 ASCII characters are ever used. However, it also imposes an overhead which is never recovered for binary files or short text files, which is primarily what archive programs are used for. Removing the "new character" code is a funndamental change in the LZA format. Consequently a number of other fundamental changes which had not been implemented previously simply because they were fundamental (such as sorting the codes by frequency) were also implemented. First, the code set was reduced to 258 (256 ASCII and one each for the FIFO buffer and main dictionary). The length code was implemented separately, also using full adaptive. This was done so that the ASCII codes would not be penalized by counts associated with all the length codes (using 256 length codes instead of 64 would have added 1 bit to every character, at least initially). The FIFO lag and dictionary range codes remained as before. The results for this new method are shown in the above tables as "Adapt 8K." As expected, it performed slightly worse on text files, but better on binary files. More importantly, it beat ZIP on every file but one. FIFO Buffer - Layered Binary Search Tree While using a hash table for the FIFO buffer is simpler and probably faster for searches, using a layered binary search tree takes much less RAM and adds and removes entries much faster. The resulting tree will use less than 2K RAM, as opposed to 64K RAM for the hash table (for a maximum match length of 65 bytes). The speed advantage is best illustrated by noting that the entire FIFO buffer is replaced every 64 entries. The search tree used for the FIFO table differs from that used in the main dictionary in that it includes an index into the table of string pointers. typedef struct fifo { struct fifo *child1; /* less than */ struct fifo *child2; /* greater than */ struct fifo *next; /* start of next tree */ unsigned char *text; /* string pointer */ unsigned int sum; /* sum of all children */ unsigned char len; /* number of common chars */ unsigned char test; /* first uncommon char */ int index; /* fifo table index */ } FIFO; One could drop the text pointer and use the text pointer array index, but it is faster to keep it. This search tree also differs from the main dictionary in how it adds and removes entries. The main dictionary used a fixed array of 8K (or 16K) entries, with the duplicate nodes coming from the free structure list. When removing a node, one must also search the remaining layers for duplicates. The FIFO buffer allocates all of its nodes from the free structure list. Whenever a new string is added, the text and index values for its matches within each layer are set to this (latest) entry, so that one will always find the most recent match. This also means that when it is time to remove an entry, there will be no duplicates remaining in the tree. There was about a 15% improvement in speed from this change. One could use a common tree structure for both the FIFO buffer and main dictionary by adding the index to the main dictionary and replacing or augmenting the text entry with a an array of string pointers (refered to by index). While this would simplify the code, allowing common functions for adding and removing entries, it would be slower and use more RAM. Tweaking the Code Using Pure Profiler I was able to improve some of the code w.r.t. speed. One thing I noticed is that Pure C compiled while(p) { if(p->test == ref) break; else if (p->test > ref) p = p->child1; else p = p->child2; } using two compares instead of one compare and two branches. Since over 10% of the CPU time is spent on this type operation (namely binary searches), I tried replacing this loop with an assembly language function. However, the overhead for calling the function about equalled the advantage of eliminating the duplicate compare.